Skip to content

Fix: unaligned stack pointer can cause crashes in jump_fcontext()#385

Open
zhangh43 wants to merge 15 commits intomainfrom
overflow
Open

Fix: unaligned stack pointer can cause crashes in jump_fcontext()#385
zhangh43 wants to merge 15 commits intomainfrom
overflow

Conversation

@zhangh43
Copy link
Contributor

@zhangh43 zhangh43 commented Dec 20, 2025

fix(coro): align stack pointer correctly and adjust stack size

The previous stack initialization logic failed to recompute the stack size
after aligning the stack pointer (SP) down to a 16-byte boundary. This
created a potential "Bounds Mismatch" where the logical stack bottom
(sp - size) could extend into the guard page or unallocated memory.

Changes:

  • Calculate sp by masking down to 16-byte alignment (Required for x86-64
    and mandatory for ARMv8/A64).
  • Dynamically compute sc.size based on the difference between the aligned
    sp and the bottom (guard page boundary).
  • Ensures the usable stack range [sp - size, sp) never overlaps with the
    protected guard page.

High Address
+--------------------------+ <--- _coroStack + kCoroStackSize (Original Top)
| Alignment Gap (0-15B) | : This space is discarded by "sp = top & ~15"
+--------------------------+ <--- sc.sp (Aligned Stack Pointer)
| | ^
| | |
| Usable Stack | | sc.size = (sp - bottom)
| [sp - size, sp) | |
| | |
| | v
+--------------------------+ <--- bottom (_coroStack + _osPageSize)
| |
| Guard Page | : Protected by PROT_NONE
| (Safety Zone) |
| |
+--------------------------+ <--- _coroStack (Allocation Base)
Low Address

Summary by CodeRabbit

  • Bug Fixes

    • Increased coroutine stack size (to 8MB) to reduce stack-overflow risk in deep/large operations.
    • Escalated transaction-related log messages to error level for clearer visibility.
    • Added stronger validation before resuming/yielding to prevent invalid resumptions and crashes.
  • Refactor

    • Improved coroutine/task lifetime, ownership, resume/yield and teardown handling with safer weak-reference guards.
  • Chores

    • Updated a subproject reference with no functional changes.

✏️ Tip: You can customize this high-level summary in your review settings.


Note

Fixes coroutine stack alignment/size and task lifetimes, and hardens ServiceStateMachine coroutine yield/resume and teardown to prevent crashes/UAF.

  • Coroutine/SSM:
    • Align coroutine stack pointer to 16 bytes and recompute sc.size with a guard page; add _coroSink member and use it instead of referencing stack-local continuations; rethrow forced_unwind.
    • Use weak_from_this() for queued resume/migrate functors; add null checks before invoking _coroResume/_coroYield; disable functors and clear state in reset/destructor/cleanup.
    • Increase coroutine stack to 8192 * 1024 bytes (8MB) in ServiceStateMachine and CoroCtx.
  • Executor:
    • Capture Task by value in coroutineResumeFunctor/coroutineLongResumeFunctor to avoid dangling references.
  • Sessions/Logging:
    • Update abort-transaction coroutine logs from log() to error() during yield/resume.

Written by Cursor Bugbot for commit 3b25393. This will update automatically on new commits. Configure here.

@coderabbitai
Copy link

coderabbitai bot commented Dec 20, 2025

Walkthrough

Increase coroutine stack size to 8MB, add guard-page and 16-byte alignment for coroutine stacks, introduce a _coroSink continuation and weak-capture safety for coroutine lambdas, fix Task lifetime captures, escalate two log calls to error, and update an eloq submodule gitlink.

Changes

Cohort / File(s) Summary
Stack Size Constant
src/mongo/transport/service_state_machine.h, src/mongo/db/kill_sessions_local.cpp
Raise ServiceStateMachine::kCoroStackSize from 3200 * 1024 to 8192 * 1024 (8MB).
Stack Context, Guard Page & Alignment
src/mongo/transport/service_state_machine.cpp
Compute stack bottom/top, reserve a PROT_NONE guard page, align stack pointer to 16 bytes, and set coroutine sc.sp/sc.size.
Coroutine Sink & Weak Safety
src/mongo/transport/service_state_machine.cpp, src/mongo/transport/service_state_machine.h
Add private _coroSink (boost::context::continuation), store/clear sink during callcc/yield, convert resume/yield handlers to use weak_from_this() with locking guards, and clear coroutine functors during teardown.
Task Lifetime Fix in Executor
src/mongo/transport/service_executor_coroutine.cpp
Capture Task by creating a local copy and moving it into returned mutable lambdas to avoid dangling-reference UAFs.
Logging Level Elevation
src/mongo/db/kill_sessions_local.cpp
Change two logging calls in abortArbitraryTransactionIfExpired (resume and yield paths) from log to error level.
Submodule Update
src/mongo/db/modules/eloq/data_substrate
Advance tracked commit hash for eloq/data_substrate submodule (gitlink update only).

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant SSM as ServiceStateMachine
  participant BoostCtx as Boost.Context
  participant Executor

  Client->>SSM: schedule task / request resume (uses weak_from_this())
  Executor->>SSM: invoke resume lambda (captures weak_this)
  alt SSM alive
    SSM->>BoostCtx: callcc -> allocate stack (8MB) with guard page, store _coroSink
    BoostCtx-->>SSM: coroutine runs (control transferred)
    SSM->>BoostCtx: yield/resume via stored _coroSink (validate client/_coroSink)
    BoostCtx-->>SSM: returns control
  else SSM expired
    Executor-->>Executor: lambda no-op after weak_ptr lock fails
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

trigger-ci

Suggested reviewers

  • xiexiaoy
  • lzxddz

Poem

🐇 I padded my stack to hop with care,
A guarded page waits just over there.
Lambdas hold Tasks that used to stray,
Weak locks make sure no one runs away.
Hop, test, merge — then nibble carrot flair.

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main objective of the PR: fixing unaligned stack pointer issues in coroutine stack initialization that cause crashes in jump_fcontext().
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch overflow

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3d34a65 and 2696c5e.

📒 Files selected for processing (1)
  • src/mongo/db/modules/eloq/data_substrate
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/mongo/db/modules/eloq/data_substrate

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/mongo/db/kill_sessions_local.cpp (1)

116-116: Reconsider the log level for coroutine flow messages.

These messages describe normal coroutine execution flow (resume/yield operations), not error conditions. Elevating them to error() level may:

  • Pollute error logs with non-error information
  • Make it harder to identify actual errors
  • Trigger unnecessary alerts in monitoring systems

Consider using LOG(1) or LOG(3) for debug-level logging instead, or remove these logs entirely if they were added temporarily for debugging the crash investigation.

🔎 Suggested fix
-                error() << "abortArbitraryTransactionIfExpired call resume.";
+                LOG(3) << "abortArbitraryTransactionIfExpired call resume.";
-                            error() << "abortArbitraryTransactionIfExpired call yield.";
+                            LOG(3) << "abortArbitraryTransactionIfExpired call yield.";

Also applies to: 134-134

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e8c34ce and 60eba8c.

📒 Files selected for processing (3)
  • src/mongo/db/kill_sessions_local.cpp (3 hunks)
  • src/mongo/transport/service_state_machine.cpp (2 hunks)
  • src/mongo/transport/service_state_machine.h (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 211
File: src/mongo/db/modules/eloq/cmake/build_eloq_log_service.cmake:26-80
Timestamp: 2025-09-25T12:24:06.434Z
Learning: The user githubzilla implemented a fix in commit fe98aaf to address the MEMORY state incorrectly triggering RocksDB discovery in build_eloq_log_service.cmake. This was a control flow issue where MEMORY builds were falling into an else branch that still attempted RocksDB discovery.
🔇 Additional comments (4)
src/mongo/transport/service_state_machine.h (1)

285-287: Stack size increase is well-justified.

The increase from 3.2MB to 8MB with the explanatory comment clearly addresses the stack overflow concerns mentioned in the PR objectives. The documentation helps future maintainers understand the rationale.

src/mongo/transport/service_state_machine.cpp (2)

58-58: Appropriate header inclusion for alignment logic.

The <cstdint> header is required for uintptr_t used in the stack pointer alignment at line 752.


735-758: Excellent fix for the stack alignment and bounds issue.

The rework correctly addresses the core problem described in the PR:

  1. Guard-page setup: Reserves the first OS page as PROT_NONE (set in constructor at line 272).
  2. Boundary calculation: Computes usable range [bottom, top) where bottom = _coroStack + _osPageSize.
  3. Alignment: Aligns sp down to 16-byte boundary using bitwise masking, which is required for x86-64 and ARMv8 ABIs.
  4. Size recomputation (key fix): sc.size = sp - bottom ensures the usable stack range [sp - size, sp) never overlaps the guard page, even after alignment adjustment.

The invariant at line 754 provides a safety check. Given typical page sizes (4KB) and the stack size (8MB), the invariant should hold even with the maximum 15-byte alignment reduction.

src/mongo/db/kill_sessions_local.cpp (1)

77-79: Stack size increase is consistent and well-documented.

The change aligns with the updated kCoroStackSize in service_state_machine.h and maintains consistency across coroutine implementations.

ServiceExecutorCoroutine::coroutineResumeFunctor() and coroutineLongResumeFunctor() captured the task parameter by reference, leaving lambdas with a dangling reference after the function returned. Under load this could corrupt memory and crash later (e.g. in munmap).
Capture Task by value (copy) inside the returned functors to ensure correct lifetime.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/mongo/transport/service_executor_coroutine.cpp (2)

319-324: Excellent fix for the dangling reference issue!

The change correctly captures task by value (via copy + move) instead of by reference, eliminating the dangling reference when the lambda outlives the function scope.

However, consider moving task when passing it to resumeTask to avoid an unnecessary copy:

-    return [thd_group = &_threadGroups[threadGroupId], task = std::move(taskCopy)]() mutable {
-        thd_group->resumeTask(task);
+    return [thd_group = &_threadGroups[threadGroupId], task = std::move(taskCopy)]() mutable {
+        thd_group->resumeTask(std::move(task));
     };

Since resumeTask takes Task by value and internally moves it, passing task as an lvalue causes a copy. Moving directly from the lambda's captured task is more efficient, especially if Task (likely std::function<void()>) has non-trivial copy semantics.


330-334: Good fix for the dangling reference issue!

The capture-by-value approach correctly prevents dangling references. Same performance suggestion as in coroutineResumeFunctor: move task when passing it to enqueueTask:

-    return [thd_group = &_threadGroups[threadGroupId], task = std::move(taskCopy)]() mutable {
-        thd_group->enqueueTask(task);
+    return [thd_group = &_threadGroups[threadGroupId], task = std::move(taskCopy)]() mutable {
+        thd_group->enqueueTask(std::move(task));
     };

This avoids copying task when enqueueTask (which takes Task by value) is called.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c994e74 and dcc7547.

📒 Files selected for processing (1)
  • src/mongo/transport/service_executor_coroutine.cpp (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 211
File: src/mongo/db/modules/eloq/cmake/build_eloq_log_service.cmake:26-80
Timestamp: 2025-09-25T12:24:06.434Z
Learning: The user githubzilla implemented a fix in commit fe98aaf to address the MEMORY state incorrectly triggering RocksDB discovery in build_eloq_log_service.cmake. This was a control flow issue where MEMORY builds were falling into an else branch that still attempted RocksDB discovery.

Fix potential use-after-free during coroutine resume/yield by avoiding raw this captures in queued resume tasks. Use weak_from_this() for _resumeTask and thread-group migration functor so queued callbacks become no-ops after ServiceStateMachine destruction.
Additionally, disable Client coroutine functors and clear SSM coroutine callbacks in _cleanupSession() to prevent yield/resume calls racing with coroutine stack teardown.
ServiceStateMachine stored _coroYield as a member functor but captured &sink from the boost::context::callcc lambda, leaving a dangling reference once the lambda frame changed/ended. This could crash in jump_fcontext() with unreadable backtraces.
Store the caller continuation (sink) in a new member _coroSink and implement _coroYield in terms of _coroSink.resume(). Clear _coroSink during reset/cleanup/destruction to avoid resuming stale continuations.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/mongo/transport/service_state_machine.cpp (1)

758-781: Excellent fix for the stack alignment bug—this is the core of the PR.

The rewrite correctly addresses the critical issue described in the PR objectives:

What was fixed:

  • Line 770-771: Explicitly computes bottom (guard page boundary) and top (stack end)
  • Line 774-775: Aligns sp down to a 16-byte boundary (required for x86-64 and mandatory for ARMv8/A64 ABIs)
  • Line 779: Crucially recomputes size = sp - bottom to reflect the usable range after alignment

Why this matters:
Previously, aligning sp down without recomputing size meant the logical stack range [sp - old_size, sp) could extend below bottom into the guard page or unallocated memory, causing segmentation faults.

Now, the usable stack range is correctly [sp - size, sp) = [bottom, sp), which never overlaps the protected guard page at [_coroStack, bottom).

Line 777: The invariant(sp > bottom) is a good safety check ensuring alignment doesn't cause overlap.

Optional: Add compile-time check for stack size vs page size

While extremely unlikely in practice (8MB stack vs typical 4KB page), consider adding a static assertion for defensive programming:

 boost::context::stack_context ServiceStateMachine::_coroStackContext() {
+    static_assert(kCoroStackSize > 2 * 4096,  // Assume max page size of 4KB
+                  "Coroutine stack must be substantially larger than OS page size");
     boost::context::stack_context sc;

This would catch any future misconfiguration at compile time.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dcc7547 and f6c1d13.

📒 Files selected for processing (2)
  • src/mongo/transport/service_state_machine.cpp (8 hunks)
  • src/mongo/transport/service_state_machine.h (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 211
File: src/mongo/db/modules/eloq/cmake/build_eloq_log_service.cmake:26-80
Timestamp: 2025-09-25T12:24:06.434Z
Learning: The user githubzilla implemented a fix in commit fe98aaf to address the MEMORY state incorrectly triggering RocksDB discovery in build_eloq_log_service.cmake. This was a control flow issue where MEMORY builds were falling into an else branch that still attempted RocksDB discovery.
🧬 Code graph analysis (1)
src/mongo/transport/service_state_machine.cpp (1)
src/mongo/db/client.cpp (6)
  • releaseCurrent (164-167)
  • releaseCurrent (164-164)
  • haveClient (160-162)
  • haveClient (160-160)
  • getCurrent (151-153)
  • getCurrent (151-151)
🔇 Additional comments (6)
src/mongo/transport/service_state_machine.h (1)

291-294: Excellent design to prevent dangling references.

Storing the caller continuation as a member variable ensures that yield/resume functors never capture references to stack-local continuation variables, eliminating a class of potential use-after-free bugs. This is a solid improvement to the coroutine lifecycle management.

src/mongo/transport/service_state_machine.cpp (5)

58-58: LGTM: Include added for stack pointer alignment.

The <cstdint> include is necessary for uintptr_t used in the stack alignment calculation (Line 775).


281-281: LGTM: Proper cleanup of continuation member.

Resetting _coroSink in both the destructor and reset() ensures stale continuations cannot be accidentally resumed, preventing use-after-free scenarios.

Also applies to: 306-306


549-583: Excellent use-after-free protection for queued coroutine tasks.

The switch from raw this captures to weak_from_this() is a critical safety improvement. Resume tasks can be queued and executed asynchronously, so capturing a weak pointer ensures that if the ServiceStateMachine is destroyed before the task executes, the callback safely becomes a no-op rather than accessing freed memory.

The pattern is correctly applied:

  1. Capture weak_from_this() in the lambda
  2. Lock to obtain a shared_ptr
  3. Early return if the SSM has been destroyed
  4. Only then access SSM members

591-614: LGTM: Coroutine continuation handling is now lifetime-safe.

The changes correctly implement the design outlined in the header comments:

  1. Line 598: Stores the caller continuation (sink) as _coroSink member, preventing dangling references to stack-local variables
  2. Lines 602-610: The yield functor properly:
    • Captures weak_from_this() for safety
    • Releases the current client (Line 608) before yielding
    • Resumes via the stored _coroSink member (Line 609)
    • Updates _coroSink with the resumed continuation
  3. Line 613: Returns the stored continuation

This prevents both use-after-free and dangling reference issues in coroutine lifecycle management.


813-824: LGTM: Enhanced teardown sequence prevents coroutine races during destruction.

The early disabling of coroutine functors (Lines 815-817) is an important safety measure that prevents any yield/resume calls from racing with the destruction of the SSM and its coroutine stack. Clearing all coroutine-related members (Lines 818-823), including the new _coroSink, ensures complete cleanup.

This defensive approach eliminates potential use-after-free scenarios during the teardown phase.

Comment on lines +285 to +287
// Coroutine stack size: Increased from 3.2MB to 8MB to prevent stack overflow
// in complex database operations (deep recursion, large BSON processing, etc.)
static constexpr size_t kCoroStackSize = 8192 * 1024; // 8MB
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for kCoroStackSize and related stack size references in codebase
echo "=== Searching for kCoroStackSize references ==="
rg -n "kCoroStackSize" --type=cpp

echo -e "\n=== Searching for coroutine stack size related code ==="
rg -n "CoroStackSize|coroutine.*stack|stack.*size" --type=cpp -i | head -50

echo -e "\n=== Checking service_state_machine.h file context ==="
rg -n -B5 -A5 "kCoroStackSize" --type=cpp

echo -e "\n=== Searching for tests related to coroutine stack or memory ==="
rg -n "coroutine|stack.*overflow|memory.*profile" --type=cpp -g '*test*' -g '*unittest*' | head -30

Repository: eloqdata/eloqdoc

Length of output: 13016


Document memory impact analysis and load testing for 8MB coroutine stack increase.

The 2.5x increase in coroutine stack size (3.2MB → 8MB) is implemented consistently across two critical code paths: service_state_machine.h and kill_sessions_local.cpp. Each mmap allocation of 8MB per connection introduces measurable memory overhead for deployments with thousands of concurrent connections.

The code rationale (preventing stack overflow in deep recursion and large BSON processing) is documented in both locations, but the following verification gaps remain:

  • No load testing results documenting memory consumption at scale
  • No configurable option to adjust stack size for different deployment requirements
  • No profiling data confirming actual stack usage justifies the 8MB threshold

These should be addressed in deployment documentation or as follow-up work, particularly for high-connection-count deployments where memory efficiency is critical.

🤖 Prompt for AI Agents
In src/mongo/transport/service_state_machine.h around lines 285 to 287, the
hardcoded coroutine stack size was increased to 8MB but lacks documentation,
load-test data, profiling evidence, and a runtime/configurable option; update
the code and repo as follows: add a clear note in the file header referencing an
external design/doc page that contains memory impact analysis and load-test
results for thousands of connections, expose the stack size as a configurable
parameter (compile-time constant fallback plus runtime override via config file
or env var/CLI flag) with validated bounds and a sensible default of 8MB, add a
short profiling hook or sample code comment showing how to capture maximum stack
usage during tests, and add a task in the repo docs (or CHANGELOG) linking to
completed profiling and load-test artifacts so operators can evaluate and tune
stack size for high-connection deployments.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
src/mongo/transport/service_state_machine.cpp (1)

772-795: Stack alignment logic is correct; consider adding minimum size check.

The 16-byte alignment (lines 788-789) and recomputed sc.size = sp - bottom (line 793) correctly ensure the usable range [sp - size, sp) never overlaps the guard page. The invariant at line 791 guards against alignment underflow.

However, after alignment you could lose up to 15 bytes. If kCoroStackSize is small or _osPageSize is large, the usable stack might be insufficient.

Consider adding a minimum size assertion
 invariant(sp > bottom);
+constexpr size_t kMinStackSize = 4096;  // or another safe threshold
+invariant(static_cast<size_t>(sp - bottom) >= kMinStackSize);
 sc.sp = sp;
 sc.size = static_cast<size_t>(sp - bottom);
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f6c1d13 and 8011a5e.

📒 Files selected for processing (1)
  • src/mongo/transport/service_state_machine.cpp (10 hunks)
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: githubzilla
Repo: eloqdata/eloqdoc PR: 211
File: src/mongo/db/modules/eloq/cmake/build_eloq_log_service.cmake:26-80
Timestamp: 2025-09-25T12:24:06.434Z
Learning: The user githubzilla implemented a fix in commit fe98aaf to address the MEMORY state incorrectly triggering RocksDB discovery in build_eloq_log_service.cmake. This was a control flow issue where MEMORY builds were falling into an else branch that still attempted RocksDB discovery.
🧬 Code graph analysis (1)
src/mongo/transport/service_state_machine.cpp (1)
src/mongo/db/client.cpp (8)
  • setCurrent (169-172)
  • setCurrent (169-169)
  • haveClient (160-162)
  • haveClient (160-160)
  • releaseCurrent (164-167)
  • releaseCurrent (164-164)
  • getCurrent (151-153)
  • getCurrent (151-151)
🔇 Additional comments (6)
src/mongo/transport/service_state_machine.cpp (6)

280-282: LGTM: Proper cleanup order.

Clearing _coroSink before unmapping the coroutine stack ensures the continuation doesn't hold references to memory being freed.


478-483: LGTM: Correctly disables yield after request completion.

The comment accurately describes the safety contract: coroutine functors are only valid during active request processing within the callcc() context. Clearing them after _coroStatus = Empty ensures Client::coroutineFunctors() returns Unavailable between requests, preventing misuse.


555-593: Excellent UAF prevention using weak_ptr capture.

The weak pointer pattern across _resumeTask (lines 558-583) and _coroMigrateThreadGroup (lines 589-593) prevents use-after-free during asynchronous resume execution. The early-return guards (lines 560-562, 590) and null-checks before invoking functors (lines 566-571) provide solid defense-in-depth.


606-624: LGTM: Robust yield implementation with safety invariants.

Storing the sink continuation as _coroSink (line 608) prevents dangling references to stack-local variables. The weak-capture in _coroYield (lines 612-616) combined with invariants (lines 618-621) ensure yielding only occurs within a valid, active coroutine context.


803-808: LGTM: Defensive guards during migration.

Checking functor validity before invoking _coroResume and _coroYield prevents null dereference if concurrent teardown cleared the functors. This is consistent with the guarded pattern in _runNextInGuard (lines 566-571).


831-842: Excellent: Aggressive functor teardown prevents races.

Setting CoroutineFunctors::Unavailable (line 834) before clearing individual members (lines 836-841) ensures no code can invoke yield/resume during SSM destruction. This is critical for preventing use-after-free when the coroutine stack is unmapped in the destructor.

@CLAassistant
Copy link

CLAassistant commented Dec 21, 2025

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
4 out of 5 committers have signed the CLA.

✅ zhangh43
✅ xiexiaoy
✅ githubzilla
✅ thweetkomputer
❌ Ubuntu


Ubuntu seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants